AITopics | branch coverage

Collaborating Authors

branch coverage

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constrained Sampling for Language Models Should Be Easy: An MCMCPerspective

Neural Information Processing SystemsJun-17-2026, 10:16:08 GMT

Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially problematic in applications like program fuzzing, where one wants to generate diverse and valid program inputs for testing purposes. We propose a new constrained sampling framework based on Markov Chain Monte Carlo (MCMC) that simultaneously satisfies three core desiderata: constraint satisfying (every sample satisfies the constraint), monotonically converging (the sampling process converges to the true conditional distribution), and efficient (high-quality samples emerge in few steps). Our method constructs a proposal distribution over valid outputs and applies a Metropolis-Hastings acceptance criterion based on the LM's likelihood, ensuring principled and efficient exploration of the constrained space. Empirically, our sampler outperforms existing methods on both synthetic benchmarks and real-world program fuzzing tasks 1.

constraint, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Benchmarking LLMs for Unit Test Generation from Real-World Functions

Huang, Dong, Zhang, Jie M., Harman, Mark, Zhang, Qianru, Du, Mingzhe, Ng, See-Kiong

arXiv.org Artificial IntelligenceAug-4-2025

Recently, large language models (LLMs) have shown great promise in automating unit test generation, significantly reducing the manual effort required by developers. To effectively evaluate the capabilities of LLMs in this domain, it is crucial to have a well-designed benchmark that accurately reflects real-world scenarios and mitigates common pitfalls. Existing LLM test generation benchmarks are limited by two critical drawbacks: data contamination and structurally simple function code. As a result, we often cannot rely on the validity of scientific conclusions drawn from empirical studies using these limited benchmarks. The empirical evidence presented may be biased due to contamination and may fail to generalize beyond toy programs due to structural simplicity. To address these problems, we introduce ULT (UnLeakedTestbench), a new benchmark specifically designed for function-level unit test generation from real-world Python functions. ULT is constructed through a multi-stage curation process that ensures high cyclomatic complexity and mitigates test case contamination. With 3,909 carefully selected function-level tasks, ULT provides a more realistic and challenging evaluation of LLMs' test generation capabilities. We also provide PLT (PreLeakedTestbench), a pair benchmark of ULT with leaked tests designed to enable a controlled analysis of memorization versus reasoning in test generation. Our evaluation results demonstrate that ULT is significantly more challenging. For example, test cases generated by LLMs only achieve 41.32\%, 45.10\%, 30.22\%, and 40.21\% for accuracy, statement coverage, branch coverage, and mutation score on average for all LLMs, respectively. These results are substantially lower than the corresponding metrics on TestEval (91.79\%, 92.18\%, 82.04\%, and 49.69\%) and PLT (47.07\%, 55.13\%, 40.07\%, and 50.80\%).

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.00408

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Automated Testing of COBOL to Java Transformation

Hans, Sandeep, Kumar, Atul, Yasue, Toshikai, Ono, Kouichi, Krishnan, Saravanan, Sondhi, Devika, Satoh, Fumiko, Mitchell, Gerald, Kumar, Sachin, Saha, Diptikalyan

arXiv.org Artificial IntelligenceApr-16-2025

Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterprise-level code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code, making manual validation of translated Java code from COBOL a necessary but time-consuming and labor-intensive process. In this paper, we share our experience of developing a testing framework for IBM Watsonx Code Assistant for Z (WCA4Z) [5], an industrial tool designed for COBOL to Java translation. The framework automates the process of testing the functional equivalence of the translated Java code against the original COBOL programs in an industry context. Our framework uses symbolic execution to generate unit tests for COBOL, mocking external calls and transforming them into JUnit tests to validate semantic equivalence with translated Java. The results not only help identify and repair any detected discrepancies but also provide feedback to improve the AI model.

large language model, machine learning, programming language, (25 more...)

arXiv.org Artificial Intelligence

2504.10548

Country: Europe > Austria (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

LLM Test Generation via Iterative Hybrid Program Analysis

Gu, Sijia, Nashid, Noor, Mesbah, Ali

arXiv.org Artificial IntelligenceMar-17-2025

Automating unit test generation remains a significant challenge, particularly for complex methods in real-world projects. While Large Language Models (LLMs) have made strides in code generation, they struggle to achieve high branch coverage due to their limited ability to reason about intricate control flow structures. To address this limitation, we introduce Panta, a technique that emulates the iterative process human developers follow when analyzing code and constructing test cases. Panta integrates static control flow analysis and dynamic code coverage analysis to systematically guide LLMs in identifying uncovered execution paths and generating better test cases. By incorporating an iterative feedback-driven mechanism, our technique continuously refines test generation based on static and dynamic path coverage insights, ensuring more comprehensive and effective testing. Our empirical evaluation, conducted on classes with high cyclomatic complexity from open-source projects, demonstrates that Panta achieves 26% higher line coverage and 23% higher branch coverage compared to the state-of-the-art.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1358

Country: North America > Canada > British Columbia (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Karanjai, Rabimba, Hussain, Aftab, Rabin, Md Rafiqul Islam, Xu, Lei, Shi, Weidong, Alipour, Mohammad Amin

arXiv.org Artificial IntelligenceJul-6-2024

Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases.

llm, test case, unit test, (14 more...)

arXiv.org Artificial Intelligence

2407.05202

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Zhang, Hongxiang, Rong, Yuyang, He, Yifeng, Chen, Hao

arXiv.org Artificial IntelligenceJun-13-2024

Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage.

fuzzer, llamafuzz, llm, (14 more...)

arXiv.org Artificial Intelligence

2406.07714

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exploring Fuzzing as Data Augmentation for Neural Test Generation

He, Yifeng, Wang, Jicheng, Rong, Yuyang, Chen, Hao

arXiv.org Artificial IntelligenceJun-12-2024

Testing is an essential part of modern software engineering to build reliable programs. As testing the software is important but expensive, automatic test case generation methods have become popular in software development. Unlike traditional search-based coverage-guided test generation like fuzzing, neural test generation backed by large language models can write tests that are semantically meaningful and can be understood by other maintainers. However, compared to regular code corpus, unit tests in the datasets are limited in amount and diversity. In this paper, we present a novel data augmentation technique **FuzzAug**, that combines the advantages of fuzzing and large language models. FuzzAug not only keeps valid program semantics in the augmented data, but also provides more diverse inputs to the function under test, helping the model to associate correct inputs embedded with the function's dynamic behaviors with the function under test. We evaluate FuzzAug's benefits by using it on a neural test generation dataset to train state-of-the-art code generation models. By augmenting the training set, our model generates test cases with $11\%$ accuracy increases. Models trained with FuzzAug generate unit test functions with double the branch coverage compared to those without it. FuzzAug can be used across various datasets to train advanced code generation models, enhancing their utility in automated software testing. Our work shows the benefits of using dynamic analysis results to enhance neural test generation. Code and data will be publicly available.

data augmentation, fuzzaug, test function, (14 more...)

arXiv.org Artificial Intelligence

2406.08665

Country:

North America > United States > California > Yolo County > Davis (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

CoverUp: Coverage-Guided LLM-Based Test Generation

Pizzorno, Juan Altmayer, Berger, Emery D.

arXiv.org Artificial IntelligenceMar-24-2024

This paper presents CoverUp, a novel system that drives the generation of high-coverage Python regression tests via a combination of coverage analysis and large-language models (LLMs). CoverUp iteratively improves coverage, interleaving coverage analysis with dialogs with the LLM to focus its attention on as yet uncovered lines and branches. The resulting test suites significantly improve coverage over the current state of the art: compared to CodaMosa, a hybrid LLM / search-based software testing system, CoverUp substantially improves coverage across the board. On a per-module basis, CoverUp achieves median line coverage of 81% (vs. 62%), branch coverage of 53% (vs. 35%) and line+branch coverage of 78% (vs. 55%). We show that CoverUp's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly half of its successes.

llm, module, osa, (16 more...)

arXiv.org Artificial Intelligence

2403.16218

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Breaking the Silence: the Threats of Using LLMs in Software Engineering

Sallou, June, Durieux, Thomas, Panichella, Annibale

arXiv.org Artificial IntelligenceJan-8-2024

Large Language Models (LLMs) have gained considerable traction within the Software Engineering (SE) community, impacting various SE tasks from code completion to test generation, from program repair to code summarization. Despite their promise, researchers must still be careful as numerous intricate factors can influence the outcomes of experiments involving LLMs. This paper initiates an open discussion on potential threats to the validity of LLM-based research including issues such as closed-source models, possible data leakage between LLM training data and research evaluation, and the reproducibility of LLM-based findings. In response, this paper proposes a set of guidelines tailored for SE researchers and Language Model (LM) providers to mitigate these concerns. The implications of the guidelines are illustrated using existing good practices followed by LLM providers and a practical example for SE researchers in the context of test case generation.

language model, llm, software engineering, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3639476.3639764

2312.08055

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > New York > New York County > New York City (0.05)
Europe > Netherlands > South Holland > Delft (0.05)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Bhatia, Shreya, Gandhi, Tarushi, Kumar, Dhruv, Jalote, Pankaj

arXiv.org Artificial IntelligenceDec-17-2023

Generating unit tests is a crucial task in software development, demanding substantial time and effort from programmers. The advent of Large Language Models (LLMs) introduces a novel avenue for unit test script generation. This research aims to experimentally investigate the effectiveness of LLMs, specifically exemplified by ChatGPT, for generating unit test scripts for Python programs, and how the generated test cases compare with those generated by an existing unit test generator (Pynguin). For experiments, we consider three types of code units: 1) Procedural scripts, 2) Function-based modular code, and 3) Class-based code. The generated test cases are evaluated based on criteria such as coverage, correctness, and readability. Our results show that ChatGPT's performance is comparable with Pynguin in terms of coverage. At the same time, ChatGPT's ability to generate tests is superior to Pynguin, as the latter is not able to generate test cases for Category 1. We also find that about 39% and 28% of assertions generated by ChatGPT for Category 2 and 3, respectively, were incorrect. Our results also show that there is minimal overlap in missed statements between ChatGPT and Pynguin, thus, suggesting that a combination of both tools may enhance unit test generation performance. Finally, prompt engineering improved ChatGPT's performance, achieving an average 28% coverage improvement in Category 2 and 15% improvement in Category 3 after about 4 iterations.

assertion, chatgpt, pynguin, (15 more...)

arXiv.org Artificial Intelligence

2312.10622

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > New York > New York County > New York City (0.05)
Asia > India > NCT > Delhi (0.05)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback